Crawl Me Maybe: Iterative Linked Dataset Preservation
نویسندگان
چکیده
The abundance of Linked Data being published, updated, and interlinked calls for strategies to preserve datasets in a scalable way. In this paper, we propose a system that iteratively crawls and captures the evolution of linked datasets based on flexible crawl definitions. The captured deltas of datasets are decomposed into two conceptual sets: evolution of (i)metadata and (ii)the actual data covering schema and instance-level statements. The changes are represented as logs which determine three main operations: insertions, updates and deletions. Crawled data is stored in a relational database, for efficiency purposes, while exposing the diffs of a dataset and its live version in RDF format.
منابع مشابه
Bayesian and Iterative Maximum Likelihood Estimation of the Coefficients in Logistic Regression Analysis with Linked Data
This paper considers logistic regression analysis with linked data. It is shown that, in logistic regression analysis with linked data, a finite mixture of Bernoulli distributions can be used for modeling the response variables. We proposed an iterative maximum likelihood estimator for the regression coefficients that takes the matching probabilities into account. Next, the Bayesian counterpart...
متن کاملExplicit and Implicit Schema Information on the Linked Open Data Cloud: Joined Forces or Antagonists?
Schema information about resources in the Linked Open Data (LOD) cloud can be provided in a twofold way: it can be explicitly defined by attaching RDF types to the resources. Or it is provided implicitly via the definition of the resources’ properties. In this paper, we analyze the correlation between the two sources of schema information. To this end, we have extracted schema information regar...
متن کاملComparing Topic Coverage in Breadth-First and Depth-First Crawls Using Anchor Texts
Web archives preserve the fast changing Web by repeatedly crawling its content. The crawling strategy has an influence on the data that is archived. We use link anchor text of two Web crawls created with different crawling strategies in order to compare their coverage of past popular topics. One of our crawls was collected by the National Library of the Netherlands (KB) using a depthfirst strat...
متن کاملAn Improved Non-Iterative Privacy Preservation Lotteries
In 2009, a non-iterative privacy preservation for online lotteries is proposed in IET Information Security by J.S lee, C.S Chan and C.C Chang [1], who claim their scheme achieve the following properties: Privacy. No one can learn the choices made by lottery players except the players themselves. Security. No one can counterfeit a winner or forge a winning lottery ticket to claim the prize. ...
متن کاملWebIsALOD: Providing Hypernymy Relations Extracted from the Web as Linked Open Data
Hypernymy relations are an important asset in many applications, and a central ingredient to Semantic Web ontologies. The IsA database is a large collection of such hypernymy relations extracted from the Common Crawl. In this paper, we introduce WebIsALOD, a Linked Open Data release of the IsA database, containing 400M hypernymy relations, each provided with rich provenance information. As the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014